Mask Classifier - Version 3 - PART 1
The first part of a systematic and methodical approach to composing a Mask Classifier CNN model with near perfect accuracy.
- You can find the free to use, deployed application at https://protected-mountain-28715.herokuapp.com.
- Training Walkthrough
- First try
- First try - Review
- Second Try
- Second Try - Review
- Conclusion
You can find the free to use, deployed application at https://protected-mountain-28715.herokuapp.com.
Contact Information
Email: toump.nick@gmail.com GitHub: https://github.com/ntoump/ Youtube: https://www.youtube.com/channel/UCFgF1lHh0fRQxY9CqpyBZgw
I should mention that, in this Walkthrough, as well as in the deployed application (https://protected-mountain-28715.herokuapp.com), I use Multi-Label Classification alongside one-hot encoding, so that each image is not limited to one category as output. Instead, the theoritical maximum output is all categories on which the model has been trained (although that is extremely unlikely), and the theoritical minimum is none of these categories (as the model may not be confident enough in any of its predictions). In this last case, as it's highlighted in the Specification of the application, the model will output an error message.
df = pd.read_csv('CSVs/only_images4.csv')
df
First try
This is my first try of the "masksv3" (i.e. the 3rd Version of this application) at creating an ever better model by using Deep Convolutional Neural Networks, continuing exactly where "masksv2" left off.
In comparison to the previous Version, in this first try I only changed the dataset, filtering out many mislabeled images and adding a few, as well as customizing my previous dataset to support one-hot encoding. At the same time, everything else (e.g. the hyperparameters) is left as is.
After reviewing the final accuracy of this first model, I'll try to improve on that by tweaking the hyperparameters.
Wish me luck!
# Get dblock, dsets, and dls
# Notice that no Presizing is used (all credits to fastai for their incredible work) in this first DataBlock
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=RandomSplitter(),
get_x=get_x,
get_y=get_y,
item_tfms = RandomResizedCrop(128, min_scale=0.35))
dsets = dblock.datasets(df)
dls = dblock.dataloaders(df)
# Everything looks good
dls.show_batch()
I start off with a small threshold (0.2), ResNet50, and a semi-random base learning rate. Furthermore, I train using _finetune, because it could be a good starting point to begin to understand what the differences are (in practical terms and applications) between the fastai model training tools (including fit, _fit_onecycle, and _finetune).
learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
The accuracy seems to have improved greatly in a few seconds (because of using a P5000 NVIDIA GPU) and epochs of training, which indicates that there was plenty of room to grow
Next, I sketch the lr to loss relationship, and get the suggested lrs, in order to train even further with a better informed lr guess.
learn.lr_find()
There does not seem to be a clear slope downwards, that's why I'll just use the lr_min.
learn.fine_tune(3, base_lr=0.0003019951749593019, freeze_epochs=4)
The accuracy has skyrocketed to almost 94%. Thus far, I can say that we've done pretty well.
lr_min, lr_steep = learn.lr_find()
Let's train some more.
learn.fine_tune(2, base_lr=lr_min)
lr_min, lr_steep = learn.lr_find()
Not bad! We just saw, for the first time, a decrease in accuracy (something that I was expecting, that's why I trained it only a few epochs per set). That's not necessarily a sign of overfitting. It would be a good opportunity, though, to get the predictions of the model, and plot the threshold to accuracy relationship.
preds,targs = learn.get_preds()
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);
The above plot shows that there is a positive relationship between threshold and accuracy. Therefore, the higher is the chosen threshold, the higher is expected to be the accuracy of the model. It is also important to note that there isn't such a great difference between choosing, say, 0.3 and 0.85 as your threshold, perhaps not even a percentage point.
First try - Review
I am fairly impressed by the model's performance. Reaching 95% accuracy, considering the variation of the ds *, the limited training time (& costs), and the uncustomized hyperparameter choices, is almost excellent.
Now onwards to manual testing.
I have a few dozen images, against which I will check my model's accuracy. These are the ones used for evaluating the masksv2 application's model as well.
*: The dataset contains, at this point, seven (#7+1) distinct categories, namely: (#7) ['-','Cloth','Gas','KN95','Medical','No_mask!','Shield', 'Surgical'].
Each category refers to a type of face mask, except for "-" and "No_mask!". The last one suggests that there is a maskless face in the given image, whereas the first one ("-") is part of an experiment I conduct named "Intended Noise" (see below).
I have three folders of images for my manual testing: "NEW", "NEU", and all the other pictures you see below.
[The names of the images are supposed to be related to their content, while that is not always necessarily the case.]
ls
# Built a quick loop to do the work for me
# Below you can see the predictions of the model
import os
pather = Path('/notebooks/Manual')
for img in os.listdir(pather):
if not img[0] == '.':
if not img[0].isupper():
print(img, learn.predict(img))
for img in os.listdir(pather/'NEW'):
print(img, learn.predict(img))
for img in os.listdir(pather/'NEU'):
print(img, learn.predict(img))
Spare images
The model did not do particularly well with the spare images. I'll get back to that very soon.
"NEW"
The "NEW" folder contains images with many people wearing masks, rather than just one person at a time (the dataset contained only such images at this point; with one (mostly) mask per image). It was at this point, while reviewing these results, that I realized my blunder; I had included two categories in the dataset for the same type of masks (namely, "Surgical" and "Medical").
This was obviously done by accident, as I had not found a comfortable API to clean the data (the fastai one does not work for multi-label classification), therefore I had to do it by hand. That gave me, nontheless, an idea; what if I included all "Surgical" images with the "Medical" labeled ones into a common category, named "Medical" (To be continued...)?
"NEU"
Now that's where the model really shined. You see, the images in this category are part of what I call "Intended Noise" (DISCLAIMER: I have not encountered this idea anywhere in DL, that's why I have taken the liberty of naming it, and I don't provide citation. If anyone knows if that's a thing, please reach out to me with the relevant sources!).
I'm not going to delve into that in this post. Keep an eye out for that, though, because a detailed approach to explaining this idea, as well as its potential benefits to DL models, will be posted soon.
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
interp.plot_top_losses(41, nrows=10)
After interpreting the results (for some wicked reason I can't seem to get the "Confusion matrix" to work with my ds) by viewing the top losses, I realize that the model misses a few easy ones, all the while being extremely accurate at useless datapoints (more on that later).
- It should be noted that I DID NOT train the first model until overfitting. I stopped mainly because I had ideas I deemed more valuable to explore, and because of the threshold to accuracy relationship, which suggested that the model is almost as good as possible. Nowadays, I tend to dismiss such early stoppings, given that these ploted relationships are but predictions.
Second Try
Firstly, I cleaned the ds, as I thought I had done earlier. I merged the "Surgical" and "Medical" categories into "Medical", and relabeled a few misclassified items.
In continuation, I will tweak slightly the hyperparameters to try to achieve better performance than the above reached (~95%), including cropping to a bigger size (224 to start with) and using Presizing, using a deeper arch, and adjusting the threshold/lr.
Note: I deleated the 224 crop results accidentally during training. It was slightly better than the above reached accuracy, even though only by a percentage point (~96%). I have included, though, below the next stage; applying Presizing with far higher resolution in the hopes of achieving near perfect accuracy.
df = pd.read_csv('CSVs/only_images4.csv')
df
# Here is the main difference of this time; use of Presizing (720->460), in the hopes of higher accuracy due to higher resolution
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=RandomSplitter(seed=42),
get_x=get_x,
get_y=get_y,
item_tfms=Resize(720),
batch_tfms=aug_transforms(size=460, min_scale=0.75))
dsets = dblock.datasets(df)
# I got a CUDA memory error; that's why the batch size had to be smaller than 64, and I traditionally chose 32
dls = dblock.dataloaders(df, bs=32)
dls.show_batch()
# Threshold = 0.8 (instead of 0.2)
learn0 = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.8))
I tried to first get the lr_min, in order to better direct the training. Pretty much expected curve
lr_min, lr_steep = learn0.lr_find()
t = torch.cuda.get_device_properties(0).total_memory
c = torch.cuda.memory_cached(0)
a = torch.cuda.memory_allocated(0)
# f = c-a # free inside cache
t, c, a
learn0.fine_tune(3, base_lr=lr_min, freeze_epochs=3)
lr_min, lr_st = learn0.lr_find()
learn0.fine_tune(2, base_lr=lr_min, freeze_epochs=3)
preds,targs = learn0.get_preds()
xs = torch.linspace(0.01,0.99,350)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);
learn0.fine_tune(2, base_lr=lr_min, freeze_epochs=2)
preds,targs = learn0.get_preds()
xs = torch.linspace(0.05,0.95,350)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);
I increase the number of points to search from up to 350. The difference in plot shapes is obvious, albeit of limited practical importance.
interp = ClassificationInterpretation.from_learner(learn0)
interp.plot_confusion_matrix()
interp.plot_top_losses(40, nrows=10)
The model was manually tested against a series of images, which were itentionally created to confuse it, and to trick it into providing false predictions. Suprisingly, the model ended up tricking me, as it fared perfectly, identifying each and every case correctly, or, at least, not wrongly.
learn0.predict('WhatsApp Image 2020-09-13 at 8.34.08 PM.jpeg')
learn0.predict('WhatsApp Image 2020-09-13 at 8.34.23 PM.jpeg')
# In this case, I passed on to it an image of a few family members of mine, and it did not provide an answer.
# That is correct, since there is no fitting category
learn0.predict('family.jpg')
learn0.predict('xr.png')
# Again, a friend of mine was not identified as a mask category. I would also consider correct a "No_mask!" prediction.
learn0.predict('boom.PNG')
# Lastly, an old-school photo of a man did not confuse the model either.
learn0.predict('t.PNG')
The model did exceptionally in the Manual Testing phase. It did not make any wrong predictions. However, these results and the top loses propelled me to understand what is really happening; when you increase the resolution, the model gets more and more inaccurate at recognizing the types of what it is searching for, and their characteristic differences.
That means that as the model gets better at figuring out how to not classify an image that does not contain a mask as a type of a mask it gets worse at locating the differencies between types of masks. The consequencies are that the model is more accurate and confident about things not being, rather than them being, what it is searching for.
# As a clarification, these are the possible categories for the time being that the model has been trained to recognize
learn0.dls.vocab
Anyways, it's a fairly good model, with potential practical use. That's why I decided to export it.
learn0.export('masksv3_modelv01.pkl')
Conclusion
In this post, I covered as well as I could the process which I followed to create the "masksv3_modelv01.pkl" model.
The journey, however, is far from over. To achieve the model displayed at https://protected-mountain-28715.herokuapp.com/ as a free, available for anyone to use application, I had to recreate the dataset a few more times, as well as to explore distinct approaches and peculiar ideas (out of which the most important could be "Intended Noise"), among other things.
Make sure to get in touch with me for any suggestions, comments or corrections. A respectable and thoughtful comment is always welcome from anyone.